cluster: wait for all servers closing before disconnect #1400

Olegas · 2015-04-11T18:41:02Z

Fix for #1305
Before this, cluster bahaves not the way it is docummented
Then disconnect is triggered, worker must wait for every server is closed
before doing disconnect actually.

See test case and discussion in the above mentioned issue

Olegas · 2015-04-11T18:42:39Z

/cc @bnoordhuis

brendanashworth · 2015-04-11T20:01:09Z

test/parallel/test-cluster-worker-wait-server-close-schedrr.js

+    // Wait for any data, then close connection
+    socket.on('data', socket.end.bind(socket));
+
+  }).listen(common.PORT, '127.0.0.1');


Could you change 127.0.0.1 to common.localhostIPv4?

Olegas · 2015-04-11T21:00:43Z

@brendanashworth done

cjihrig · 2015-04-13T03:39:35Z

lib/cluster.js

@@ -625,12 +625,26 @@ function workerInit() {

  Worker.prototype.disconnect = function() {
    this.suicide = true;
+    var waitingHandles = 0;
+
+    function check() {


Can you please use a more descriptive name. Maybe checkRemainingHandles.

Olegas · 2015-04-13T08:04:13Z

@cjihrig tests are the same, except scheduling policy. But there are note in the ...schedrr test. I don't think it is a bug, but it is a consequence of SHCED_RR implementation.

Fishrock123 · 2015-05-21T14:03:05Z

@cjihrig is this (still?) lgty?

cjihrig · 2015-05-30T15:18:57Z

@Fishrock123 the code changes themselves look good to me. I don't love the tests. Seems like a single test should be adequate, and it would be nice to simplify the test a bit if possible.

Olegas · 2015-05-31T20:01:22Z

@cjihrig I think two tests are required to ensure both scheduling policies are working correct.

cjihrig · 2015-06-01T19:07:43Z

I don't think the code being tested here is really reliant on the scheduling policy. Granted, the cluster module's behavior relies on the scheduling policy, but we don't need to duplicate every test to run under both policies. Maybe I'm missing something, but your changes to Worker.prototype.disconnect() only involve looping over a collection of handles. There are (or at least should be) other tests in place to ensure that collection of handles is generated correctly.

sam-github · 2015-06-02T05:32:25Z

fwiw, I agree with @cjihrig about the tests.

As far as this change goes, it looks OK to me, though I haven't run or tested it. In particular, it looks like the ProgressTracker was lost in the cluster rewrite: https://github.com/joyent/node/blob/v0.10.38-release/lib/cluster.js#L520-L528, and this PR brings something like it back.

Olegas · 2015-06-02T06:03:12Z

@sam-github @cjihrig now there are only one test. If it is ok now I'll do rebase/squash to a single commit.

sam-github · 2015-06-02T19:33:31Z

Please squash.

I just ran the test _without_ your changes, and the test passes when it should not. Can you check this?

Fix for iojs/io,js#1305 Before this, cluster bahaves not the way it is docummented Then disconnect is triggered, worker must wait for every server is closed before doing disconnect actually. See test case and discussion in the above mentioned issue

Olegas · 2015-06-02T20:56:58Z

@sam-github commits squashed, test fixed.

sam-github · 2015-06-09T17:10:45Z

test/parallel/test-cluster-worker-wait-server-close.js

+  });
+
+  process.once('exit', function() {
+    assert.ok(checks.disconnectedOnClientsEnd, 'The worker disconnected before all clients are ended');


run make lint, this line is too long.

sam-github · 2015-06-09T17:18:36Z

I would just do the lint fixups and merge, but I'd like you to confirm that I'm not missing some subtlety about the exit event testing.

Olegas · 2015-06-09T17:34:53Z

@sam-github Yes, exit event testing is not necessary. It is copy-paste from another cluster test (
I'll fix everything today evening and add docs too

Before this, cluster behaves not the way it is documented. When disconnect is triggered, worker must wait for every server is closed before doing disconnect actually. Reviewed-By: Sam Roberts <vieuxtech@gmail.com> Reviewed-By: Colin Ihrig <cjihrig@gmail.com> PR-URL: nodejs#1400 Fixes: nodejs#1305

sam-github · 2015-06-09T17:52:15Z

Cleaned up and landed in 9c0a1b8, thanks @Olegas

Olegas · 2015-06-09T19:19:44Z

@sam-github I'm sorry, but tests now broken. First setTimeout is required.

If I run the test without my fixes, it will run forever. But if I set scheduling policy to SCHED_NONE test will fail (as expected)

This is because of default SCHED_RR scheduling policy on Linux/Mac/etc...

Then scheduling policy is SCHED_RR, connection is accepted inside of master and then passed to child process. If child process disconnects BEFORE response handler is completed, connection will not get accepted. So, child process never receives data and them keep running forever.

Fishrock123 · 2015-06-09T20:29:07Z

@sam-github I think this probably needed a CI run, you have access to the CI, right?

Anyways, here's a run on current master: https://jenkins-iojs.nodesource.com/view/iojs/job/iojs+any-pr+multi/788/

thefourtheye · 2015-06-09T21:19:52Z

lib/cluster.js

    }
-    process.disconnect();
+
+    if (waitingHandles === 0) {


Why is this block necessary? Won't checkRemainingHandles itself handle this?

What if we have no any handles yet?

Cool. Thanks for clarifying :-)

Wait for data to arrive from worker before doing a disconnect. Without this, whether the disconnect arrives at the worker before the master accepts and forwards the connection descriptor to the worker is a race. Reviewed-By: Sam Roberts <vieuxtech@gmail.com> Reviewed-By: Johan Bergström <bugs@bergstroem.nu> Reviewed-By: Rod Vagg <rod@vagg.org> PR-URL: #1953 Fixes: #1933 Fixes: #1400

mscdex added the cluster Issues and PRs related to the cluster subsystem. label Apr 11, 2015

brendanashworth reviewed Apr 11, 2015
View reviewed changes

cjihrig reviewed Apr 13, 2015
View reviewed changes

chrisdickinson added the land-on-master label Apr 17, 2015

Olegas force-pushed the disconnect-waits-for-server-close branch from a923192 to 35854ae Compare June 2, 2015 20:56

sam-github reviewed Jun 9, 2015
View reviewed changes

sam-github mentioned this pull request Jun 9, 2015

Worker disconnect is not waiting for server closing #1305

Closed

cjihrig closed this Jun 9, 2015

thefourtheye reviewed Jun 9, 2015
View reviewed changes

This was referenced Jun 10, 2015

async-wrap: add provider id and object info cb #1896

Closed

test: test-cluster-worker-wait-server-close.js is hanging #1933

Closed

cluster: fix broken/hanging tests #1934

Closed

sam-github mentioned this pull request Jun 11, 2015

Revert "cluster: wait on servers closing before disconnect" #1945

Closed

rvagg mentioned this pull request Jun 11, 2015

Release proposal: 2.3.0 #1939

Closed

sam-github mentioned this pull request Jun 15, 2015

cluster: wait for all servers closing before disconnect nodejs/node-v0.x-archive#25527

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cluster: wait for all servers closing before disconnect #1400

cluster: wait for all servers closing before disconnect #1400

Olegas commented Apr 11, 2015

Olegas commented Apr 11, 2015

brendanashworth Apr 11, 2015

Olegas commented Apr 11, 2015

cjihrig Apr 13, 2015

Olegas commented Apr 13, 2015

Fishrock123 commented May 21, 2015

cjihrig commented May 30, 2015

Olegas commented May 31, 2015

cjihrig commented Jun 1, 2015

sam-github commented Jun 2, 2015

Olegas commented Jun 2, 2015

sam-github commented Jun 2, 2015

Olegas commented Jun 2, 2015

sam-github Jun 9, 2015

sam-github commented Jun 9, 2015

Olegas commented Jun 9, 2015

sam-github commented Jun 9, 2015

Olegas commented Jun 9, 2015

Fishrock123 commented Jun 9, 2015

thefourtheye Jun 9, 2015

Olegas Jun 9, 2015

thefourtheye Jun 9, 2015

cluster: wait for all servers closing before disconnect #1400

cluster: wait for all servers closing before disconnect #1400

Conversation

Olegas commented Apr 11, 2015

Olegas commented Apr 11, 2015

brendanashworth Apr 11, 2015

Choose a reason for hiding this comment

Olegas commented Apr 11, 2015

cjihrig Apr 13, 2015

Choose a reason for hiding this comment

Olegas commented Apr 13, 2015

Fishrock123 commented May 21, 2015

cjihrig commented May 30, 2015

Olegas commented May 31, 2015

cjihrig commented Jun 1, 2015

sam-github commented Jun 2, 2015

Olegas commented Jun 2, 2015

sam-github commented Jun 2, 2015

Olegas commented Jun 2, 2015

sam-github Jun 9, 2015

Choose a reason for hiding this comment

sam-github commented Jun 9, 2015

Olegas commented Jun 9, 2015

sam-github commented Jun 9, 2015

Olegas commented Jun 9, 2015

Fishrock123 commented Jun 9, 2015

thefourtheye Jun 9, 2015

Choose a reason for hiding this comment

Olegas Jun 9, 2015

Choose a reason for hiding this comment

thefourtheye Jun 9, 2015

Choose a reason for hiding this comment